Compiler CH4 Grammars and Parsing

Compiler CH4 Grammars and Parsing

Parsing: Syntax Analysis

  • decides which part of the incoming token stream should be grouped together.
  • the output of parsing is some representation of a parse tree.
  • intermediate code generator transforms the parse tree into an intermediate language.

Comparisons between reqular expr. and context-free grammars

A context-free grammar:

𝑒𝑥𝑝→exp⁡〖𝑜𝑝 exp|(𝑒𝑥𝑝)〗 |𝑛𝑢𝑚𝑏𝑒𝑟
𝑜𝑝→+|−|∗

A regular expression:

𝑛𝑢𝑚𝑏𝑒𝑟=𝑑𝑖𝑔𝑖𝑡 〖𝑑𝑖𝑔𝑖𝑡〗^∗
𝑑𝑖𝑔𝑖𝑡=0|1|2|3|4|5|6|7|8|9

The major difference is that the rules of a context-free grammar are recursive.

Rules from F.A. to CFG

  1. For each state there is a nonterminal symbol.
  2. If state 𝐴 has a transition to state 𝐵 on symbol 𝑎, introduce 𝐴→𝑎𝐵.
  3. If 𝐴 goes to 𝐵 on input 𝜆, introduce 𝐴→𝐵.
  4. If 𝐴 is an accepting state, introduce 𝐴→𝜆.
  5. Make the start state of the NFA be the start symbol of the grammar.

Ex.

Why do not we use c.f.g to replac r.e?

r.e. => easy & clear description for token.
r.e. => efficient token recognizer
modularizing the components (The grammar rules use regular expressions as components)

Feature of programming language

contents:

  • declarations
  • sequential statements
  • iterative statements
  • conditional statements

Description of programming language

  • Syntax Diagrams
  • Context Free Grammars (CFG)

Capabilities of Context-free grammars

  • give precise syntactic specification of programming languages
  • a parser can be constructed automatically by CFG
  • the syntax entity specified in CFG can be used for translating into object code.
  • useful for describing nested structures such as balanced parentheses, matching begin-end’s, corresponding if-then-else, etc.

Context-Free Grammars: Concepts and Notation

  • A context-free grammar 𝐺=
    • A finite terminal vocabulary
      • The token set produced by scanner
    • A finite set of nonterminal vocabulary
      • Intermediate symbols
  • A start symbol 𝑆∈ that starts all derivations
    • Also called goal symbol
  • 𝑃, a finite set of productions (rewriting rules) of the form
    • 𝐴→𝜆 is a valid production

Derivation

  • One step derivation
    • If 𝐴→𝛾, then 𝛼𝐴𝛽⇒𝛼𝛾𝛽
  • One or more steps derivation ⟹^+
  • Zero or more steps derivation ⟹^∗

If 𝑆⟹, then 𝛽 is said to be sentential form of the CFG

  • SF(G) is the set of sentential forms of grammar G (may contain nonterminal vocabulary )

𝐿(𝐺)={ if }

𝐿(𝐺)=SF(G)∩

回溯CFG

Errors in Context-Free Grammars

unless nonterminal

ambiguous

  • rewrite CFG
  • specifying the intention(使用括號)

Transforming Extened BNF Grammars

Extended BNF≡BNF

  • Extended BNF allows
    • Square bracket []
    • Optional list {}

Parsers and Recognizers

Two general approaches to parsing

  • Top-down parser
    • Expanding the parse tree (via predictions) in a depth-first manner
    • Preorder traversal of the parse tree
    • Predictive in nature
    • lm
    • LL
  • Buttom-down parser
    • Beginning at its bottom (the leaves of the tree, which are terminal symbols) and determining the productions used to generate the leaves
    • Postorder traversal of the parse tree
    • rm
    • LR

Grammar Analysis Algorithms

Goal of this section

  • discuss a number of important analysis algo. for grammars

Cont’d

  • data structure of a grammar G

what nonterminals can derive

  • A => BCD => BC => B =>
  • An iterative marking algo.

ex.

Rule

Follow(A)

  • A is any nonterminal
  • Follow(A) is the set of terminals that may follow A in some sentential form (跟隨在A後面的終止符號)
    • $Follow(A) = {a \in V_t | S=> ^* …Aa…} \cup {if, S=> ^+aA , then , \

Compiler CH4 Grammars and Parsing
https://z-hwa.github.io/webHome/[object Object]/Compiler/Compiler-CH4-Grammars-and-Parsing/
作者
crown tako
發布於
2024年4月18日
許可協議